First and foremost, I will make a quickly analyse of brazilian market that is going to help to choose a sample data which will be better to describe the whole market. Besides, gain of computer performance to test and valuation models. In the firs moment we just have done a data mining to preprer the data for forecast analyse, and standarize few variables.
Remark about how I create group of Avarage Pack Size, I used a boxplot to set the size of each group:
The first graphic bellow ,volume and value normalized, helps us to undestand the trully moviment of market. In 2016, the volume market sharply drecrease more than the value market in the first semester creating a intersection point, after that, volume drecrease less than before June 2016 as you can note the blue line slope. One of my doubt is after June 2016, problably the price should have impacted the volume, as red line is almost flat. Also we can conclude that has been constant the price increase.
A quick undestand which Flavor has more impact in the market, easly notice in the graphic bellow the Flavor Milk Chocolate has a huge impact in the market result. So could be interst to analyzes separete, for do that, was select three Flavors by performance.
Now, you can check witouth be normalized. I had to use 3 different graph because the range of values is large, so if try to plot in a single chart, woulb clear the information.
Analysing the material market, as easy to realise the plastic predominant, the clomun chart on the left is the value market and on the right volume market by material package. In the end of 2017 seems exist a inverse correlation or just a casuality because the campaing agains plastics are inrease nowadays.
The caloric content always have been dominated by sugar, the graphic bellow can prove this sentence.
The market has two type of size package that lead the other.
After understand the market, we can set the categories more influencer for run the forecast model, is usual help in the computer performance because will work with less data.
Another way to understand the table is plotting in a line chart by year. The drecreasing moviment is not visible just in the chart along months, but is visible if we look at y axis. The same information that is in the table above, is confirm in the charts.
There are 2 facts more that could have strong influence in the market, Coverage and Shelf Life. Was calculated the mean by Year and Month, as you can see, the charts the coverage is decreasing since January 2015 so may be one of factor which has strong influence in the Market. The shelf life, in overall, keep in the same range and is clear to notice the sazonality in the end of the year. The histogram and density chart help us to detect any outlier which could mess up the mean result.
For help us to identify the strong correlation in the variables, the two chart bellow will clarify all issue which could appear. We can conclued.
Now, we are able to choose the best atributes for work with forecast with less data and the result will be able to describe the market as well. Follow the atributes:
We can not assume a cluster with these atributes together because was made analyse independet. So if the row has one of these tree atributes will be acptable to work. The cluster that has been create represent 97% of the whole volume and 96% of the whole value.
The forecast model that I used was Prophet developed by Facebbok. I will train the data with Prophet model and analyse the error if is acceptable to move on.
The last month result in 2017, the brand Lili drop by 64,8% share volume and Gen up by 25,4% share volume, if you analyse the Price Index will be possible to realise the price has a strong influence because the brand Lili rose the price by 40% over the market price and the Glen brand dramatically dropped by 23% of market price.
The Lily was chosen to apply a forecast, even the last share volume had a suddenly drop this won’t reflect a long term.
I adopt the RSME and MAPE for valueate the model result. The dataset is not big enough to give a confident accuracy, I would need a compute more powerfull for consider a large dataset.
The Cherry flavor has a scene more equal of share volume by 2015, the next year Harley brand has big variation a long the year that need to do a deep analyse for understand those movement whcih is unsual.
I will choose the Harley brand to apply the forecast model for 2018.
The model forecast get a good error result.
The model forecast get a good error result.
| ERROR | Are |
|---|---|
| RMSLE | 1.22 |
| MAPE | 1.62 |